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Abstract 

In this paper we prove several results on normal forms for linear displacement context-free grammars. The 
results themselves are rather simple and use well-known techniques, but they are extensively used in more 
complex constructions. Therefore this article mostly serves educational and referential purposes. 


1 Displacement context-free grammars 

Displacement context-free grammars (DCFGs) are a reformulation of well-nested multiple context-free grammars. 
In this draft we use tuple notation. Let X be a finite alphabet, then X* denotes the set of all words with letters 
in X, e being the empty string. When X is fixed, 0^ denotes the set of all tuples of the form (uq, ..., Uk), Ui £ X* 

and 0 = (J 0fc. We call k the rank of the tuple u = (uq, ... ,Uk) and denote it by rk(u). The length |rt| of a 

ke N 

tuple |it| is the sum of lengths of all its components, we denote by 0© the set of all tuples of length l. The 
notation 0©^ and 0©^ are also understood in a natural way. 

We use the displacement context-free languages notation for well-nested MCFLs. We consider tuples of 
strings instead of gapped strings. Let X be a finite alphabet, then X* denotes the set of all words with letters in 
X, e being the empty string. When X is fixed, 0& denotes the set of all tuples of the form (uo,..., Uk ), Ui £ X* 

and 0 = (J 0fc- We call k the rank of the tuple u = {u o,... ,Uk) and denote it by rk (u). The length |tt| of a 

fceN 

tuple |u| is the sum of lengths of all its components, we denote by 0® the set of all tuples of length l and also 

write 0^d for (J ©(©. 

j^l 

On the set of tuples we define the concatenation operation •: 0j x 0j —> 0j+j and the countable set of 
intercalation operations ©;: 0, x Qj —> Qi+j- i: 

(xo,...,Xi)-(yo,...,yj) = (x 0 ,..., xiy Q ,..., yj) 

(x 0 ,...,Xi)Qi(yo,...,yj) = (x 0 ,... x/_iy 0 ,yi,..., yjx h ..., Xi) 

Let A be a finite ranked set of nonterminals and rk: N —> N be the rank function. Let Opk = {•, ©i, ■ ■ ■, ©fc}) 
the set Tmfc(A, X) of £;-correct terms is defined as follows: 

1. Vj < k (Oj cz Tm/ C (N, X). 

2. If a, 13 £ Trrq. and rk(a) + rk(/3) ^ k , then (a • f3) £ Tm*, rk(a • j3) = rk(a) + rk(/3). 

3. If j ^ k, a, f3 £ Tmfc, rk(cc) -I- rk ((3) < k + 1, rk(a) ^ j, then 
(a Qj (3) £ Tnifc, rk(a • (3) = rk(a) + rk(/3) — 1. 

We assume that all the operation symbols are leftassociative and concatenation has greater priority then 
intercalation. We may also omit the ■ symbol, so the notation A ©2 BC ©1 D means (A ©2 ((B ■ C)) ©1 D). 

Let Var = {x±,X 2 ,...} be a countable ranked set of variables, such that for every k there is an infinite number 
of variables having rank k. A context C\x\ is a term where a variable x occurs in a leaf position, the rank of 
x must respect the constraints of term construction. Provided /3 £ Tm^ and rk(x) = rk(/3), C[/3] denotes the 
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result of substituting f3 for x in C. A valuation function v assigns words of rank l to the variables of rank l for 
any l ^ k in an arbitrary way. It also maps all the elements of 0 to themselves. Interpreting the connectives 
from Opk as corresponding binary operations, we are able to calculate the valuation of every ground term (i. e. 
containing no nonterminal occurrences). It is easy to prove that rk(a) = rk(z/(a)) holds for every a. The set of 
fc-correct ground terms is denoted by GrTrrifc(E). 

Definition. A /c-displacement context-free grammar (A;-DCFG) is a quadruple G = (IV, E, P, S'), where E is a 
finite alphabet, N is a finite ranked set of nonterminals and S n A = 0, S e N is a start symbol such that 
rk(S ) = 0 and P is a set of rules of the form A —> a. Here A is a nonterminal, a is a term from Tm k (N, E), 
such that rk(A) = rk(a). 

Definition. The derivability relation Hg 6 IV x Tm k associated with the grammar G is the smallest reflexive 
transitive relation such that the facts (B —> /3) E P and A \- C\B ] imply that A |— C\f}\ for any context C. Let 
Lg{A) = {v(a) | A b g oc, a e GrTm *,} denote the set of word, which are derivable from a nonterminal A , then 
L(G) = L g (S). 

Example. A Ai-DCFG Gk = ({S', T}, {a*, j i e [0; k]}, P, S'), where the set P is defined below, derives the 
language L k = {a^b™ ... a™b™}. 

S - (...( T©! e)...)O l£ 

(k— 1) times 

T —> a 0 (T ©i (bo, ai)... Q k (bk-i,ak))b k 
T -► ( ) 

(/c+1) times 


In what follows we assume that all the string tuples which occur in term leaves belong to G^ 1 ). Obviously, 
this constraint does not restrict the generative power of DCFGs. 

Definition. A term is called linear if it contains zero or one occurrences of nonterminals. A grammar is linear 
if right sides of all its rules are linear terms. 

In this paper we study normal forms for linear DCFGs. The following result for DCFGs in general was 
obtained in [1|. 

Theorem 1. Every k-DCFG is equivalent to some k-DCFG G = (N, E,P, S') which has the rules only of the 
following form: 

1. A B ■ C, where Ae N - {X}, B,C e N — {5}, 

2. A^B Qj C, where j < jfe, AeN - {X}, B,C e N - {S, X}, 

3. A —> a, where aeE, 

4- X -► (e,e), 

5 . 5 —» e. 

2 Normal forms for linear DCFGs 

A valuation may be extended to variables and nonterminals by assigning every variable an arbitrary word of 
appropriate rank. When the valuation is fixed, the value of a context is calculated just like the term value. Two 
contexts are equivalent if they have the same value under all valuations. Obviously, if we replace the right-hand 
term in a grammar rule by an equivalent term, the generated language does not change. Basic equivalencies are 
listed below: 
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Statement 1. The following ground m.ulticontexts are equivalent: 

1. (xi ■ x 2 ) ■ x 3 ~ xi • (x' 2 • x 3 ), 

2. (xi ■ x 2 ) Qj x 3 ~ (xi ©j x 3 ) • x 2 if j sS rk(xi), 

5. (xi • x 2 ) ©j X 3 ~ Xi • (x 2 Oj-rk(xi) x 3) */rk(xi) < j < rk(xi) + rk(x 2 ), 

/• (xi ©/ x 2 ) Oj x 3 ~ (xi Oj x 3 ) ©; +rk(a , 3 )_ 1 x 2 if 3 < l, 

5. (xi Qi x 2 ) Qj x 3 ~ xi ©; (x 2 ©j-i+i x 3 ) ifl^j<l + rk(x 2 ), 

(5. (xi ©/ x 2 ) ©j x 3 ~ (xi ©j_ rk ( X2 )+i ^ 3 ) Qi x 2 ifj>l + rk(x 2 ). 

7. (e,e) ©1 xi ~ Xi, 

8. xi ©j (e, e) ~ xi /or any j < rk(xi). 

Lemma 1. Every linear k-DCFG is equivalent to some k-DCFG with the rules only of the form, 

• A —* uB or A —> .Bit, |n| < 1, u ¥= £, 

• A —♦ B Qj u, |n| < 1, 

• A—*u, |w| < 1, 

Proof. Through the proof we define a well-formed term by the following definition: 

• A nonterminal or an element of ©f 5 ® 1 ) is well-formed, 

• If a is a well-formed term, then any fc-correct term of the form ua or era, where u e ©f^ 1 ) and u =£ e, is 
well-formed, 

• If a is a well-formed term, then any k -correct term of the form aQj u, where u e is well-formed, 

It is sufficient to prove that every linear term a is equivalent to some well-formed term. This is done by induc¬ 
tion on term construction using the basic equivalencies and the fact that (uq, ..., ui)Qja ~ (uq, ..., iij—i)a(uj ,..., u{) 
for any term a. □ 

In what follows we sometimes denote the tuple (e,e) by 1. 

Lemma 2. Every linear k-DCFG G is equivalent to some k-DCFG with the rules only of the form, 

• A —> uB or A —> Bu, |u| ^ 1, u e, 

• A —> B Qj u, |it| < 1, 

• A—* u,\u\ = 1, 

• 5 — e. 

Proof. The proof is analogous to e-rules elimination in standard DCFGs. We assume that G already has the 
form introduced in the previous lemma. We want to create a new grammar with the set of rules P' where every 
nonterminal A =/= S of rank l generates all the tuples except (e,... , e) = . At first we determine for every 

1 +1 times 

nonterminal, whether it generates the word l l , such nonterminals are called e-generating. If A generates only 
this word, then it is called strictly e-generating. 

We process every element of the old set of rules P by the following algorithm. 
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1. If the rule has the form A —> Bi * it, * e Opk and B is not strictly e-generating, then this rule is added to 
P'. 

2. If the rule has the form A —> B[U, |w| = 1 and B is e-generating, then we also add the rules, obtained by 
binarizing the rule A —> (e,e) p u. 

3. If the rule has the form A —> uB[, |rt| = 1 and B is e-generating, then we also add the rule A —> u(e,e) p . 

4. If the rule has the form A —> Bi Qj u, |u| = 1 and B is e-generating, then we also add the rule A —> 
(e,ey _1 «(e,e) J_ J. 

5. We include to P' all the rules in P of the form A —* u, |u| = 1. 

6. We also include the rule S —> e, if e e L(G). 

The correctness of the constructed grammar is proved by standard induction on word length. 

□ 


Lemma 3. Every linear k-DCFG G is equivalent to some k-DCFG with the rules only of the form 

• A —* uB or A —* Bu, |u| ^ 1, u A s, 

• A —> B Qj u, |m| = 1, 

• A —> u, \u\ = 1, 

• S^e. 

Proof. We assume that G already satisfies Lemma [5J At first we want to remove the rules of the form A —> BQj£. 
To reach this goal we create for every nonterminal B and every j e [1; rk(S)] its j-th bridge B 3 with the following 
properties: if B generates the word (uq, ..., Uj-\,Uj, ..., uf), then B 3 generates the word {u o,..., Uj-iUj ,..., uf) 
and vice versa. Then we create bridges for the newly introduced nonterminals and so on. Since the bridged 
nonterminal has lower rank then the initial one, this process will terminate. 

To satisfy the declared properties we extend the grammar with the following rules. The notation u 3 denotes 
the word obtained from u = (uq, ... , Uj—i,Uj, ..., u{) by removing the j-th gap. The subscript here and to the 
end of the paper marks the rank of the nonterminal. 

1. For every rule A —> uB we add the rule Ai —* u^B in case j ^ rk(u) and the rule A 3 —> uB l ~ 3 in case 
l ^ rk(u). 

2. For every rule A —> B r u we add the rule A 3 —> Bu 3 ~ r in case j > r and the rule A 3 —* B 3 u in case j ^ r. 

3. For every rule A ^ B Qi u we add the rule A 3 —> B 3 ©;_i u in case j < m , the rule A 3 —> B Qi u 3 ~ l+1 in 
case l ^ j < l + rk(u) and A 3 —> B^~ rk ( u ) +1 Q l u in case j ^ l + rk(rt). 

4. For every rule A —> u we add the rule A 3 —> u 3 . 

5. If the grammar contained the rule S —> e, we preserve this rule. 

Afterwards we remove replace every rule of the form A —* B Qj e with the rule A —> B 3 . We also replace all 
the rules of the form A —> BQj (e, e) by the rule A —> B and then eliminate unary rules by standard procedure. 

It remains to remove the rules of the form A —> B Qj l l for l ^ 2. It is done analogously to the previous 
step. On the set of tuples we define the j, /-split operation u 3,1 , which inserts the tuple l l into the j-th gap of 
u, this operation is naturally extended to languages. For every nonterminal B we introduce its j, /-split B 3,1 (in 
case rk(i?) + / ^ k + 1) which generates the (J, Z)-split of L(B). We repeat this procedure until all nonterminals 
of rank less than K have split.ted versions. It is done just in the same way we have introduced the bridge 
nonterminals. 

Now we replace every rule of the form A —> BQj l 1 by the rule A —> B 3,1 and eliminate unary rules as earlier. 
The lemma is proved. □ 
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Finally, we want to eliminate tuples of length 0 at all. For every natural p we introduce an unary operation 
/ p , which transforms a tuple of the form u = val p to the string u/ p = va in case a e X, otherwise this operation 
is undefined. Informally, it deletes p rightmost e components of the tuple provided the rightmost fragment of the 
obtained tuple will be nonempty. The operation \p is defined symmetrically. Both the operations are naturally 
extended from individual tuples to languages. 

Theorem 2. Every linear k-DCFG G is equivalent to some k-DCFG with the rules only of the form 

• A —> uB or A —> Bu, |u| = 1, 

• A —> B Qj u, |it| = 1, 

• A —* u, |rt| = 1, 

• S —* s. 

Proof. We assume that initial grammar G = (IV, E, P, S) already satisfies Lemma [3l We set N' = {A/ p \ A e 
N, p ^ rk(A)}, S' = S /o and construct the set P' by the following procedure: 

1. For every rule of the form A —> uB we add the rule A/ p —> uB/ p for all possible p. 

2. For every rule of the form A —> B(l q al p ) (every rule of the form A —* Bu with |it| = 1 can be expressed 

so) we add the rule A/ p —* B(l q a). 

3. For every rule of the form A —> Bl q and every p ^ q, we add the rule A/ p —> Bp p _ q y 

4. For every rule of the form A —> B Qj u and every p < rk(S) — j we add the rule A/ p —> B/ p Qj u. 

5. For every rule of the form A —> B Qj (l q al r ) we add the rule A/ p —> B Qj (1 q a) with p = r + (rk(L>) — j). 

6. For every rule of the form A —> a we add the rule A/q —> a. 

7. If (S' —> e) £ P, then we also add the rule S/q —> e. 

It is straightforward to check that L(A/ p ) = (L(A))/ p , hence L^S/q) = (L(S))/q = L(S) as required. We have 
eliminated rules of the form A —> Bl p , the rules of the form A —* 1 P B are removed analogously. The theorem 
is proved. □ 
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